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Abstract 

■'  Newton ’s  method  has  proved  to  be  a  very  efficient  method  for  solving  st  rictly 
convex  unconstrained  minimization  problems.  For  the  nonconvex  case,  various 
modified  Newton  methods  have  been  proposed. 

In  this  paper,  a  new  modified  Newton  method  is  presented.  The  method  is 
a  linesearch  method,  utilizing  the  Choleskv  factorization  of  a  positive-definite 
portion  of  the  Hessian  matrix.  The  search  direction  is  defined  as  a  linear  combi¬ 
nation  of  a  descent  direction  and  a  direct  ion  of  negative  curvature.  Theoretical 
properties  of  the  method  are  established  and  its  behaviour  is  studied  when 
applied  to  a  set  of  test  problems.  /' 

‘  Up .  v 

Keywords:  Unconstrained  minimization,  modified  Newton  method,  negative 
curvature,  Cholesky  factorization,  linesearch,  steplength  algorithm 


I.  Introduction 

In  this  paper  we  propose  a  method  for  finding  a  local  minimizor  of  the  problem 

minimize  f{x), 
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where  /  is  a  twice-continuousiy  differentiable  function.  This  fundamental  problem 
has  been  studied  extensively  and  various  methods  have  been  proposed  that  use  first 
and  second  derivatives.  The  aim  is  to  generate  a  sequence  of  iterates  {x/c}fci0 
converge  to  a  point  x  satisfying  the  first-  and  second-order  necessary  conditions,  i.e., 
V/(x)  is  zero  and  V2/(x)  is  positive  semidefinite. 

Most  methods  that  utilize  second-derivative  information  may  be  viewed  as  exten¬ 
sions  of  Newton’s  method,  in  the  sense  that  they  are  identical  to  Newton’s  method 
in  a  neighbourhood  where  the  Hessian  is  positive  definite.  If  the  Hessian  is  not 
positive  definite  at  some  iterate,  the  Newton  step  may  not  reduce  the  objective 
function.  Consequently,  if  the  method  is  required  to  generate  a  sequence  of  improv¬ 
ing  estimates,  some  modification  is  needed.  Such  modified  Newton  methods  have 
been  studied  for  two  decades,  see  for  example  Fiacco  and  McCormick  [FM68],  Gill 
and  Murray  [GM74J,  McCormick  [McC77],  Fletcher  and  Freeman  [t  k  m j,  Muka/ 
and  Polak  [MP78],  Kaniel  and  Dax  [KD79],  More  and  Sorensen  [MS79]  and  Gold- 
farb  [G0I8O]. 

Most  modified  Newton  methods  solve  equations  using  a  factorization  of  the  lies 
sian.  The  method  proposed  by  Gill  and  Murray  [GM74]  uses  a  modified  Cholesky 
algorithm,  in  which  a  diagonal  matrix  is  implicitly  added  to  the  H-  ssian  to  make 
it  positive  definite.  A  similar  modified  Cholesky  algorithm  based  on  an  alternative 
diagonal  correction  has  been  proposed  by  Schnabel  and  Eskow  [SE88].  The  methods 
proposed  by  Fletcher  and  Freeman  [FF77]  and  More  and  Sorensen  [MS79]  use  the 
Bunch-Parlett-Kaufman  factorization  of  the  Hessian  (see  [BP71],  [BK77]). 

In  the  method  proposed  in  this  paper,  the  Cholesky  algorithm  with  complete 
pivoting  is  performed  until  all  potential  pivot  elements  are  smaller  than  a  preassigned 
tolerance.  The  Cholesky  factor  is  used  to  obtain  a  search  direction,  which  may  be 
a  linear  combination  of  a  descent  direction  and  a  direction  of  negative  curvature. 
It  is  shown  that  the  gradient  is  zero  and  the  smallest  eigenvalue  of  the  Hessian 
is  bounded  below  by  a  small  negative  number  at  all  limit  points  of  the  iterative 
sequence.  The  magnitude  of  the  bound  may  be  predetermined  by  adjusting  certain 
preassigned  tolerances. 

2.  Basics 

2.1.  Assumptions 

The  following  assumptions  are  made  throughout  the  paper: 

Al.  The  objective  function  is  twice  continuously  differentiable. 

A2.  The  level  set  .S’(x0)  =  {x  :  f(x)  <  f(x0)}  associated  with  the  starting  point  Xy 
is  compact. 

2.2.  Preassigned  parameters 

The  proposed  method  depends  on  seven  preassigned  scalar  parameters.  These  pa¬ 
rameters  specify  different  tolerances  and  for  reference,  their  purpose  and  range  of 
values  are  are  briefly  summarized  here. 
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is  a  parameter  reeded  for  the  Cholesky  factorization.  It 
is  used  to  determine  the  dimension  of  the  positive-definite 
portion  of  the  Hessian. 

is  a  parameter  used  to  reject  small  pivots  in  the  factorization. 
No  pivot  elements  smaller  than  (2hmin  are  accepted, 
define  an  acceptable  interval  for  the  initial  steplength. 
specifies  a  tolerance  associated  with  the  direction  of  negative 
curvature. 

is  a  parameter  used  in  the  linesearch  to  guarantee  a  sufficient 
decrease  in  /. 

is  a  parameter  used  to  determine  the  rate  of  decrease  of  the 
steplength  in  the  backtracking  linesearch. 


2.3.  Terminology 

T  he  idea  of  a  descent  direction  and  a  direction  of  negative  curvature  are  important 
when  computing  the  search  direction.  A  vector  p  is  a  descent  direction  at  a  point  x  if 
V f(x)7p  <  0.  Likewise,  p  is  a  direction  of  negative  curvature  at  x  if  pTV2f(x)p  <  0. 
Given  a  symmetric  matrix 


A'  = 


T  NT  \ 
N  G  )  ’ 


with  T  nonsingular,  the  Schxtr  complement  of  T  in  K  will  be  denoted  by  K/T,  and 
is  defined  as 

K/T  =  G-  NT~'Nt. 

'l'he  matrix  K/T  will  be  referred  to  as  “the”  Schur  complement,  when  the  matrix 
T  is  clear  from  the  context,  for  further  discussion  of  the  Schur  complement,  see 
Cottle  [Cot7d]. 

Throughout  the  paper,  the  subscript  k  denotes  the  iteration  index,  and  sub¬ 
scripts  i  and  j  denote  particular  components  or  columns  of  a  matrix  or  vector. 
When  element  i,  j  of  a  matrix  II k  is  addressed,  we  refer  to  it  as  htJ — i.e.,  the  low¬ 
ercase  letter  is  used  and  the  iteration  subscript  is  dropped.  Also,  for  vectors  and 
matrices,  when  the  term  norm  is  used,  we  mean  the  Euclidean  vector  norm  and  the 
corresponding  induced  matrix  norm. 


3.  Preliminary  Discussion 

At  the  k- th  iteration  of  the  proposed  method,  Xk  denotes  the  current  iteration  point, 
<)k  denotes  V/(.Tfc)  and  Ilk  denotes  V2/(x/j).  With  Newton's  method  as  the  model, 
it  is  desirable  to  compute  the  Newton  search  direction  whenever  Ilk  is  sufficiently 
positive  definite.  If  Ilk  is  known  to  be  positive  definite,  such  a  direction  may  be 
computed  using  the  Cholesky  factor  of  the  Hessian.  Whenever  the  Hessian  is  not 
sufficiently  positive  definite,  the  method  presented  here  is  based  on  the  Cholesky 
factorization  of  a  subset  of  the  rows  and  columns  of  Ilk ■  Complete  pivoting  is 
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used,  that  is,  the  maximum  diagonal  element  is  chosen  as  the  pivot  at  each  step. 
Suppose  that  n\  steps  of  the  factorization  have  been  performed  and  let  77  denote 
the  permutation  matrix  representing  the  column  interchanges.  We  have 


nTiikn  =  (  Hn 

\  7/21 


Hu  \ 
H22  ) 


and  HTgk  = 


(3.1) 


where  Hu  is  a  positive-definite  prinicipal  submatrix  of  order  nj,  with  Cholesky 
factor  Tin.  If  Rn  =  R\\  Hl2,  we  obtain  the  identity 


nTHkn  = 


Rn 

rTo 


r  0 
0  nJHkn j Hw 


Tin 

0 


R\2 

I 


where  nTHkH/Hu  is  the  Schur  complement  H22  ~  H2iH1]l  Hl2. 

In  order  to  simplify  the  notation,  we  shall  assume  that  no  permutations  are 
required.  This  implies  that  77  is  an  identity  matrix,  and  consequently  nTIlkll  -  Hk. 
We  emphasize  that  this  does  not  alter  the  theoretical  results  of  later  sections. 

The  factorization  is  usually  terminated  when  all  potential  pivot  elements  in 
Hk/H\\  are  smaller  than  a  tolerance  c2max,{/i„}.  However,  if  all  diagonal  elements 
of  the  Hessian  are  small  or  negative,  the  pivot  tolerance  is  given  by  (2hmin  for  a 
preassigned  positive  constant  hmin.  Consequently,  the  pivot  tolerance  is  defined  as 
c2hk,  where 

hk  =  max{max{/i„},/inun}-  (3.-2) 


The  Cholesky  factor  is  computed  by  rows,  and  the  Schur  complement  is  explic¬ 
itly  updated  at  each  step  of  the  factorization.  Consequently,  if  the  factorization  is 
terminated  with  rn  <  n,  the  elements  of  the  final  Schur  complement  are  known. 
Moreover,  since  we  control  the  smallest  acceptable  pivot  element,  we  have  an  upper 
bound  on  the  diagonal  elements  of  the  n2  x  ri2  matrix  Hk/ Hn.  These  properties  of 
the  factorization  will  prove  important  when  computing  directions  of  negative  cur¬ 
vature.  It  is  important  to  note  that  the  dimensions  of  the  matrices  Hu,  Hu,  H  2\ 
and  H22  depend  on  k. 

The  n  X  n\  matrix  Z  is  defined  to  be 

z  ’  ( 0 ) '  (3:,i 


where  the  matrix  7  is  an  n  1  x  nj  identity  matrix.  The  n  x  n2  matrix  Y  is  defined 
to  be 


Y 


H\\  77.2  y 


(3-1 ) 


where  we  let  y;  denote  the  j-th  column  of  Y . 
Lemma  3.1.  The  following  relations  hold: 


ZTHkZ  =  Hu, 

ZTHkY  =  0  and 
YrHkY  =  H22-  H2XHjHl2. 
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Proof.  The  result  follows  from  substituting  for  Hk/Hn,  Z  and  Y  using  (3.1),  (3.3) 
and  (3.4).  | 

Again,  we  emphasize  that  the  dimensions  of  Z  and  Y  depend  on  k.  The  following 
lemma  shows  that  the  columns  of  the  nx  n  matrix  M  =  ^  Z  Y  'j  form  a  basis  for 
ft". 


Lemma  3.2.  The  n  X  n  matrix  M  is  nonsingular. 

Proof.  The  result  is  immediate  from  the  fact  that  det(M)  =  1.  | 

The  following  lemma  relates  the  smallest  eigenvalue  of  Hk  to  the  smallest  eigen¬ 
value  of  HkjH\\- 


Lemma  3.3.  If  Hk  is  indefinite  then 

Xmin(Hk/  H\  1  )  <  ^min  {H  k)  < 


ny  ||2  k / H u  ). 


Proof.  It  follows  from  Lemma  3.1  that  Hk/Hu  =  YTHkY -  Let  u  denote  an  eigen¬ 
vector  of  unit  length  corresponding  to  the  smallest  eigenvalue  of  YTHkY .  It  follows 
from  (3.4)  that  uTYTYu  >  1.  Sylvester’s  law  of  inertia  yields  A„„ n(YrHkY)  <  0, 
giving 

o  >  Xmin(YTIIkY)  =  uTYTHkYu  =  ^-^~uTYTYu. 

uJ  YJ  Yu 

The  proof  of  the  second  inequality  is  completed  by  noting  that 


0>^>A 

uTYTYu  -  r 


.(Hk) 


and 


(3.5) 


utYtYu  <  ||y||2. 

Using  the  Courant-Fischer  minimax  characterization  of  eigenvalues  (see  e.g., 
Wilkinson  [Wil65,  page  101]),  it  follows  that  the  smallest  eigenvalue  of  Hk  is  the 
global  minimum  of  the  problem 

minimize  vTHk  v 

ve»n 

subject  to  vTv  =  1. 

Lemma  3.2  implies  the  existence  of  vectors  vz  and  vY  such  that  v  =  Zvz  +  Yvy. 
Substituting  for  v  in  (3.5),  and  using  the  identity  ZTHkY  =  0  yields  the  problem 

minimize  vTZTHkZv7  +  vT.YTHkYvv 

subject  to  v^ZTZvz  +  2v^ZtYvy  +  v$YtYvy  =  1. 

By  definition,  ZTHkZ  =  H\\  is  positive  definite,  and  it  follows  that  the  global 
minimum  of  (3.6)  is  no  smaller  than  the  global  minimum  of  the  problem 

minimize  vYYTHkYvv 

dz£3?”I  ,l)y€»n2 

subject  to  vzZTZvz  +  2v^ZTi'vY  +  v^YtYvy  =  1. 


(3.6) 


(3.7) 
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Since  this  is  a  problem  where  the  gradient  of  the  constraint  is  nonzero  at  all  feasible 
points,  the  constraint  qualification  always  holds.  Therefore,  if  vz  and  vY  are  global 
minimizers,  there  must  exist  a  Lagrange  multiplier  o  such  that  the  equations 

u{ZTZvz  +  ZtYvy)  =  0  (3.8a) 

YTHkYvy  +  u{YTZvz  +  YtYvy)  =  0  (3.8b) 

vTzZTZvz  +  2  vtzZtYvy  +  vtyYtYvy  =  1  (3.8c) 


are  satisfied. 

The  global  minimum  of  (3.6)  is  negative,  so  that  the  global  minimum  of  (3.7)  is 
also  negative.  If  v  is  zero,  it  follows  from  (3.8b)  that  the  global  minimum  of  (3.7) 
is  zero,  which  is  a  contradiction.  Therefore,  (3.8a)  implies  that  vz  is  determined  by 
vy,  with 

vz  =  -(ZtZ)~xZtYvy. 

Using  this  value  of  vz  and  the  definitions  of  Z  and  Y ,  problem  (3.7)  is  equivalent 
to  the  problem 

minimize  vZYTHkYvv 

Y  (3.9) 

subject  to  VyVy  =  1. 

The  proof  of  the  first  inequality  is  completed  by  noting  that  the  global  minimum  of 
(3.9)  is  the  smallest  eigenvalue  of  YTHkY .  | 

We  also  require  a  result  that  relates  the  smallest  eigenvalue  of  a  symmetric 
matrix  to  the  magnitude  of  its  elements. 

Lemma  3.4.  If  all  elements  of  an  n  x  n  symmetric  matrix  A  have  absolute  values 
less  than  p,  no  eigenvalue  of  A  has  absolute  value  larger  than  pn. 

Proof.  This  is  an  immediate  consequence  of  the  Gerschgorin  circle  theorem — see 
e.g.,  Golub  and  Van  Loan  [GV83,  page  200].  I 


4.  The  Cholesky  Factorization 

At  each  iterate,  a  positive-definite  principal  minor  of  the  Hessian  is  factorized  as 
outlined  in  the  previous  section.  Some  standard  results  concerning  the  Cholesky 
factorization  are  needed  to  derive  uniform  bounds  on  ||//j~11||-  These  results  are 
reviewed  in  this  section.  For  a  complete  discussion  of  the  Cholesky  factorization, 
see  Higham  [If ig87] . 

Lemma  4.1.  If  a  positive-definite  nxn  matrix  A  is  factorized  using  the  Cholesky 
algorithm  with  complete  pivoting ,  the  elements  of  the  Cholesky  factor  R  have  tin 
following  properties: 

rn  >  r22  >  ■  •  ■  >  rnn,  (1.1a) 

|  r,_,  |  <  n,  for  j  =  1, . . .  ,n,  t=l,...,j-l.  (lib) 
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Proof.  For  any  j  >  i  the  complete  pivoting  strategy  yields 

j-i 

2  s'  _2  V-'  -2 

rn  —  r»  L-*  rtj  ’ 

l=i 

from  which  (4.1a)  follows.  Since  A  is  positive  definite,  it  holds  that  r13  is  positive, 
and  therefore  r?  >  rfj.  | 


Lemma  4.2.  If  R  is  the  Cholesky  factor  of  an  n  x  n  symmetric  positive-definite 
matrix  obtained  by  complete  pivoting,  the  elements  of  its  inverse  U  have  the  following 
properties. 


I  Uij  (  < 


v 


-i-i 


for  j  =  l,...,n,  *  =  1 ,  — ,  J  —  1 


•n 


u3J  =  —  for  j  =  l,...,n 


'  33 


Uij  =  0 


for  j  =  l,...,n,  i  =  j  +  l,...,n. 


Proof.  The  matrix  U  satisfies  the  equation  RU  =  /.  The  j-th  column  of  this 
equation  gives 


=  0 


if  t  >  j 


U33  — 


' 33 


Lemma  4.1  implies  that 


1  ^  •  • 
U,j  = - >  TnUlj  if  l  <  ]. 

r"  l=it  1 


luwi  <  H  lUijl  if  *  <  i- 

l=i  + 1 


By  induction,  it  follows  that 


l«ul  < 


V 


-i-l 


if  i  <  j. 


'33 


This  bound  on  the  element  growth  is  usually  unduly  pessimistic.  However,  for 
certain  special  matrices,  substantial  element  growth  may  occur — see  e.g.,  Higham 
[Hig87,  page  6].  What  is  important  here  is  the  existence  of  a  bound.  Such  a  bound 
is  needed  in  order  to  obtain  a  uniform  bound  on  )| /f || . 

Lemma  4.3.  There  exists  a  positive  constant  Co,  such  that  for  all  k,  ||H1~11||  <  co- 
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Proof.  Since  no  pivot  element  smaller  than  e2/^,,  is  accepted  in  the  Cholesky 
factorization  we  have  rn,ni  >  £\//i  mjn.  Lemma  4.2  implies  that 


( ^11  )ij  S 


2n 


for  i  =  1 . ri!  and  j  =  1 , . . . ,  n  \ . 


CV^min 

The  identity  Ilf*  =  Rf^Rf^  and  Lemma  3.4  yield  the  desired  result.  I 


5.  Computation  of  the  Search  Direction 

In  the  proposed  method  a  search  direction  pk  is  computed  at  the  k-th  iterate.  The 
vector  pk  is  defined  in  terms  of  two  other  vectors;  a  descent  direction  sk  and  a 
direction  of  negative  curvature  <4- 


5.1.  Computation  of  the  descent  direction 

The  descent  direction  Sk  satisfies  the  equation 


where 


Bk«k  =  ~9k- 


Bk 


II  n  0  ^ 

0  hkI  )  ' 


(5.1) 


(5.2) 


and  hk  is  defined  by  (3.2). 

If  ni  =  n,  then  Bk  =  Hk  and  is  the  Newton  direction.  If  n2  =  n  then 
Bk  =  h„unt  and  sk  >s  a  multiple  of  the  steepest-descent  direction.  (In  general.  77 j 
need  not  be  equal  to  the  number  of  positive  eigenvalues  of  Hk-  For  example,  the 
matrix  /  —  eeT ,  where  e  denotes  an  n-vector  with  unit  components,  has  71— 1  positive 
eigenvalues,  but  7ii  =  0.  However,  the  results  reported  in  Section  8  include  only  one 
case  where  ni  was  zero.) 

The  vector  .s k  of  (5.1)  is  computed  by  solving  the  triangular  systems 


RIuu  =  -gi  and  Ruv  =  u,  with  sk  = 

\  -(1  ffik)92  j 

When  Sk  is  computed  from  these  equations,  the  norms  of  sk  and  gk  are  related  in  a 
uniform  way.  I' he  following  lemma  shows  that  sk  satisfies  descent  properties  similar 
to  those  required  by  McCormick  [Mc.C77]. 

Lemma  5.1.  // sk  is  defined  by  (5.1)  there  exist  positive  constants  C]  and  c2  inde¬ 
pendent  of  k,  such  that  for  all  k,  it  holds  that 

-&k9k  >  cill^ll2  and  ||sfc||  >  c2||sfc||. 


5.  Computation  of  the  Search  Direction 
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Proof.  The  definition  of  B k  yields 

INI  > - fiTH-iiT/iTi.' 

max{|lffn  ||.(1/A*)} 

The  bound  on  ||/f1-11||  obtained  from  Lemma  4.3  implies  the  existence  of  c2. 

The  definition  of  s k  yields 

-sjgk  >  min{Anlin(//f11),(l/hjt)}||5/fcjj2. 

Since  Hu  is  positive  definite  and  symmetric,  we  may  employ  the  identity 

Ami„(//r,,)  =  A  max(Hu)  =  Wx F 

The  compactness  of  5(iq)  and  the  smoothness  of  /  ensure  the  existence  of  c^  | 


5.2.  Computation  of  the  direction  of  negative  curvature 

The  formula  for  dk  is  derived  from  a  method  for  computing  directions  of  negative 
curvature  in  quadratic  programming  (see  Forsgren  et  al.  [FGM89]).  If  the  variables 
corresponding  to  //2 2  are  temporarily  locked  at  their  current  values,  a  direction  of 
negative  curvature  is  defined  by  releasing  one  or  two  of  the  locked  variables.  This 
scheme  corresponds  to  using  either  y,  or  y,  ±  y_,  as  a  direction  of  negative  curvature 
for  a  specific  choice  of  i  and  j.  The  choice  of  i  and  j  is  determined  by  the  values  of 
the  elements  of  Hk/Hu.  When  the  factorization  of  II k  is  terminated,  these  elements 
yTHkVj  are  known  for  1  <  i,  j  <  n2,  without  explicitly  computing  the  vectors  y, 
(see  Lemma  3. 1 ). 


Let  g  £  (0,1)  denote  a  preassigned  constant  and  let  p  denote  max, 
The  vector  dk  is  computed  as  follows. 

,j  \yjHky}\- 

if  p  <  e 2hk/rj  then 

dk  =  0 

(5.3a) 

else  if  y[Hkyx  =  -p  for  some  i  then 

dk  =  ±y, 

(5.3b) 

else  if  |y,r//fcy;|  =  p  for  some  1  j  then 

1  T 

dk  =  =(y,  -  sgn(y,  IIky})yj) 

(5.3c) 

end  if 


In  each  case,  we  choose  the  sign  of  dk  so  that  gkdk  <  0. 

In  the  Cholesky  algorithm,  the  pivot  elements  are  chosen  from  the  diagonal 
of  the  Schur  complement,  and  it  follows  that  yjllky,  <  (2hk  for  i  =  1,  ...,  n2. 
Consequently,  if  p  >  (2hk/r)  then  y]Hky,  <  P  for  all  i  and  dk  is  well  defined. 
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In  order  to  obtain  dk,  it  is  necessary  to  compute  y,  or  y,  ±  yr  'I  his  is  done  in- 
solving  an  equation  involving  11\\  and  R\2-  For  example,  the  computation  of  //,  4-  yj 
requires  the  solution  of  the  equation 

1  f  “  ) 

+  cj)  w',,»  y<  +  y}  =  1  ,  .  •  (5. 1) 

V5  \75,'+0,J 

The  following  lemma  shows  that  any  nonzero  dk  is  a  direction  of  negative  cur¬ 
vature. 


Lemma  5.2.  If  dk  is  nonzero  then 

A<0  jr„i4tSJ±^p±. 

Proof.  In  each  case,  the  sign  of  dk  is  chosen  so  that  <Jkdk  <  0.  I.et  p  denote 
max,j  I yfihyjl  if  4  is  given  by  (5.3b),  then  djll  kdk  =  yj II k yx  ~  -p.  If  dk  is  given 
by  (5.3c),  then 

dkHkdk  =  \(yJuky>  +  yjtiky3)-  \ylfhvj\- 

where  \y[Kkyj\  =  p-  Since  y] II kyx  and  yjll kyJ  are  both  less  than  or  equal  to  c ihk. 
it  holds  that  djHkdk  <  e2hk  —  p.  The  inequality  p  >  c2hk/i)  implies  that  in  either 
case 

x  (1 

d[Ukdk  <  --- 

as  required.  | 


Finally,  we  relate  the  curvature  along  any  nonzero  dk  to  the  smallest  eigenvalue 
of  IIk. 


Lemma  5.3.  If  dk  is  nonzero,  there  exists  a  positive  constant  c.j,  independent  of  k. 
such  that  for  all  k 


dIdk 


~  ^3  ^min  (iik)- 


Proof.  Let  p  denote  maxtiJ  \y'{llky  |.  If  dk  is  nonzero,  it  follows  from  the  proof  of 
Lemma  5.2  that  d[llkdk  <  —  (1  —  r ))p.  Lemma  3.1  implies  that  Amin(  Hkj  H\\  )  >  -pn, 
hence 

d'kUkdk  <  ^pAnml( //,///.!)• 

From  Lemma  3.3  we  have 


‘jlMk 

dldk 


<  ’  ~  1] 

—  jT  i  'Minn 

"  dkdk 


(!ik)- 


Now  (5.3b),  (5.3c)  and  (5.1)  may  be  used  to  obtain 

i<d[d,  <||r7V||<i  +  ||//,I||a||ffr1,||2. 
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The  uniform  boundedness  of  | j rf ^ 1 1  now  follows  from  Lemma  4.3  and  the  assumptions 
on  /.  | 

The  significance  of  this  lemma  is  that  a  nonzero  dk  cannot  be  an  arbitrarily  poor 
direction  of  negative  curvature  compared  to  the  eigenvector  corresponding  to  the 
smallest  eigenvalue  of  Hk  (which  is  the  best  possible  choice).  The  vector  dk  may  be 
zero  even  if  Hk  is  indefinite.  However,  when  dk  is  zero,  the  following  lemma  gives  a 
bound  on  the  indefiniteness  of  Hk . 

Lemma  5.4.  If  dk  =  0  then  A,,^//*)  >  -ne2kk/rj. 

Proof.  If  «2  -  0,  then  Hk  is  positive  definite  and  A >  0.  Assume  that 
n2  >  0.  Since  dk  —  0,  it  holds  that  p  <  (2hkjr).  This  result,  together  with  Lemma  3.4 
implies  that 

Anun(tffc///ll)  >  -n2p>-^-^, 

V 

and  it  follows  from  Lemma  3.3  that 

I 

5.3.  Computation  of  the  search  direction 
The  search  direction  pk  is  defined  to  be 


Pk  =  sk  +  0kdk,  (5.5) 

where  the  scalar  0k  is  defined  as  follows: 


if  dk  ^  0 

and  sjHksk  >  djllkdk  then 

sTlI.dk 

/  slH.d 

1  slllkSk 

A  _  k  k  k  i 

0  ~  4», A  \ 

[  k  k  k 

\dlHkdki 

+  1  k  k  k 

' 

(5.6a) 

else 

O 

II 

-sc 

(5.6b) 

end  if 

Note  that  if  n2  =  0  (i.e.,  if  Hk  is  sufficiently  positive  definite),  pk  is  the  Newton 
direction.  The  choice  of  0k  is  important  only  if  dk  is  nonzero.  The  particular  choice 
of  0k  given  above  is  motivated  by  the  following  lemma,  which  also  shows  that  if  dk 
is  nonzero,  pk  is  also  a  direction  of  negative  curvature. 

Lemma  5.5.  If  dk  is  nonzero,  then  0k  >  0  and  pJlfkPk  -  dI^kdk- 


/ 
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Proof.  If  <4  ^  0  and  0k  =  0  then  p[HkPk  —  s\Hksk  <  dJ/Z/td*..  If  <fjt  7^  0  and 
0k  ^  0  it  follows  from  the  definition  of  0k  that  d[ZZfcdfc  <  s^H ksk  and  the  square 
root  in  (5.6a)  is  well  defined.  In  this  case  0k  is  the  unique  positive  number  that 
satisfies  the  quadratic  equation  ( Sk  +  0kdk)THk(sk  +  0kdk )  =  d^ZZ^d^.  • 

The  following  lemma  shows  that  the  norm  of  Pk  is  uniformly  bounded. 

Lemma  5.6.  If  Pk  is  defined  by  (5.5),  ||p^ ||  is  uniformly  bounded. 

Proof.  Lemma  5.1  and  the  compactness  of  S(x0)  imply  that  ||sfc||  is  uniformly 
bounded. 

From  the  proof  of  Lemma  5.3  it  follows  that  ||<4||  is  uniformly  bounded.  Lemma 
5.2  guarantees  that  the  denominator  of  (5. 6a)  is  uniformly  bounded  away  from  zero. 
Since  /  £  C2  and  the  level  set  S(x 0)  is  compact,  it  follows  that  0k  is  uniformly 
bounded,  as  required.  | 

One  consequence  of  Lemma  5.6  is  that  if  dk  is  nonzero,  pk  cannot  be  an  arbitrarily 
poor  direction  of  negative  curvature. 


6.  Computation  of  the  Iterates 


Unlike  the  methods  suggested  by  McCormick  [McC77j,  More  and  Sorensen  [MS79] 
and  Goldfarb  [G0I8O],  if  //*  is  indefinite,  the  next  iterate  lies  on  a  line  emanating 
from  Xk,  instead  of  an  arc.  At  a  given  iterate  Xk ,  we  will  consider  the  case  when  an 
initial  estimate  ak  £  [amin. oniax]  of  the  steplength  along  pk  is  given.  One  way  of 
generating  such  an  a k  is  discussed  in  Section  8.1. 

We  follow  McCormick  [McC77]  and  guarantee  a  sufficient  decrease  by  comparing 
/  to  a  damped  truncated  Taylor  series  consisting  of  two  or  three  terms.  The  resulting 
algorithm  may  be  viewed  as  an  Armijo-type  linesearch  [Arm66],  extended  to  the 
indefinite  case. 

Let  p  and  7  denote  preassigned  constants  such  that  p  £  (0,-*)  and  7  £  (0.1). 
Given  x k  and  a*;  £  [amin,amax]*  the  number  tj.  is  defined  to  be  the  smallest  non¬ 
negative  integer  i  such  that 


f{xk  +  I'otkPk)  <  f(*k)  +  Pl'Ok9kPk 

2  2.Q2 

f(xk  +  I'QkPk)  <  /(**)  +  tn'fhfJkPk  +  “ — kPkIlkPk 


if  dk  =  0;  (6.1a) 
if  dk±  0.  (6.1b) 


The  next  iterate  X/t+i  is  defined  as 


*fc+i  =  ik  +  7'k(*kPk-  (6.2) 

A  complete  description  of  the  modified  Newton  method  is  given  in  Algorithm  6.1. 
In  order  to  show  that  the  algorithm  is  well  defined,  we  present  two  lemmas,  which 
are  slightly  modified  forms  of  a  lemma  given  by  More  and  Sorenson  [MS79,  Lemma 
2.2]. 
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Specify  tolerances  e,  hm j„,  a,™,,,  armax,  tj,  p  and  7; 
k  *—  0;  converged  <—  false; 

repeat 

Evaluate  gk  and  //*; 

Factorize  Hk  to  obtain  nj,  n2l  Ru,  R\2  and  Hkjli\\; 

Compute  Sk  and  <4; 
if  (ni  =  n  or  4  =  0)  then 
Pk  —  Sfci 

else 

Compute  /4; 

Pfc  —  Sfc  +  Ac  4; 

end  if 

converged  »—  convergcnce-test; 
if  (not  converged )  then 

Compute  Ofc  €  [<tmin,<Wx]; 

Compute  4  so  that  /(x  fkakPk )  is  sufficiently  decreased; 
**+1  «-  **  +7'',a*P*; 

end  if 

until  converged; 


Algorithm  6.1.  A  modified  Newton  method  for  unconstrained  minimization 


Lemma  6.1.  If  p  €  (0,^)  is  a  given  constant  and  ip  is  a  continuously  dijferentiable 
univariate  function  such  that  <p'( 0)  <  0,  then  there  exists  a  positive  scalar  £  such 
that 

<r>(0  <  ¥>(0)  +  M<p'(°)C 

for  £  G  (0,0- 

Proof.  The  Taylor-series  expansion  for  a  positive  £  yields 

^MC)  -  vKO)  -  /np'(O)C)  =  (1  -  p)<p'(0)  +  </(0C)  -  *'(0), 

for  some  0  E  (0, 1),  and  it  follows  that 

^lim+  £(v>(0  -  *>(0)  -  W'(0)C)  =  (1  -  PV'(0)  <  0. 

Hence,  there  exists  a  positive  number  £  such  that 

<p(C)  -  <p(0)  -  w'(0)C  <  0 


for  all  (  e  (0,C).  | 
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Lemma  6.2.  If  p  E  (0,^)  is  a  given  constant  and  ip  is  a  twice -continuously  differ¬ 
entiable  univariate  function  such  that  v>'(0)  <  0  and  ip"( 0)  <  0,  then  there  exists  a 
positive  scalar  (  such  that 

/•2 

¥>(C)  <  vK0)  +  /i9'(0K  +  /i'V(0)y. 

for  c  €  (0,C). 

Proof.  The  Taylor-series  expansion  for  a  positive  Q  yields 

^MC)  -  ¥>(0)  -  ¥>'(0)C  -  mV'(O)^)  =  ^pV(O)  +  -  <p"m 

for  some  9  £  (0, 1),  and  it  follows  that 

^ioV  ^(V5(0  ~  s?(0)  "  ^(0)C  "  /'V'(O)y)  =  ^~yJ'(0)  <  0. 

Hence,  there  exists  a  positive  number  (  such  that 

¥>(C)  -  ^(0)  -  V»'(0)C  -  /tV'(0)y  <  0. 
for  all  £  G  (0,0-  The  proof  is  completed  by  noting  that 

y>'(0)C  <  Pt’WK- 

I 

We  can  now  show  that  a  sequence  {xk}k=o  generated  bv  (6.2)  is  well  defined. 
Lemma  6.3.  The  sequence  {xk}fLQ  is  well  defined. 

Proof.  First  assume  that  dk  is  nonzero.  It  follows  from  Lemmas  5.2  and  5.5  that 
9kPk  $  0  and  PkHkPk  <  If  we  define  v>(C)  =  f(*k  +  C Pk)>  we  have  /(())  =  <j'lpk 
and  ip"( 0)  =  p[llkpk ■  Lemma  6.2  implies  that  given  o*.  there  exists  a  nonnegative 
integer  ik  such  that  (6.1b)  holds. 

Assume  that  dk  is  zero  so  that  pk  =  sk.  If  sk  =  0,  then  (6.1a)  holds  for  ik  -  0. 
If  Sk  ^  0,  then  Lemma  5.1  implies  that  g[sk  <  0.  The  application  of  Lemma  6.1 
with  <p(f)  =  f(xk  +  C Pk)  implies  that  there  exists  an  ik  such  that  (6.1a)  holds.  | 

It  is  of  interest  to  study  the  behaviour  of  f(x)  along  pk.  It  follows  from  Taylor- 
series  expansion  that 

f(*k  +  C* :Pk)  =  /(**)+  < kUkPk  +  ■ jplHkPk  +  Tki-l'k'Pk.  U). 
where  the  remainder  term  is  given  by 

rk{xk,Pk,fk)  =  —pli^^fi-rk  +  9kCkPk )  -  V2/(xt))pfc  (6.3) 

for  some  9k  E  (0, 1 ). 

In  the  following  lemma,  we  establish  the  behaviour  of  the  remainder  term  as  k 
tends  to  infinity  and  (k  tends  to  zero. 
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15 


Lemma  6.4.  //lim*_00C*  =  0  then 

]im  r* (Ji-Bliil  =  0. 

Q 

Proof.  Using  properties  of  norms  and  (6.3)  we  get 

<  Mlj|v2/(x*  +  okCkPk)  -  v2/(^fc)ll- 

kk  z 

Assumption  A2  and  Lemma  5.6  imply  that  ||ijt||  and  ||pjt||  are  uniformly  bounded. 
Since  limfc_00  0t  =  0  it  follows  that  |0b|  is  uniformly  bounded.  Therefore,  there  exists 
a  compact  set  C  such  that  xk  £  C  and  xk  +  0kCkPk  €  C  for  all  k.  Since  C  is  compact 
and  /  is  twice  continuously  differentiable,  it  follows  that  ||V2/||  :  C  — *  &  is  uniformly 
continuous.  Hence,  for  all  e  >  0  there  exists  a  6  >  0  such  that  ||V2/(i)  —  V2/(y)||  <  e 
for  all  x,y  E  C  such  that  |jar  -  y||  <  6.  Since  limjt-.co  £*  =  0  and  ||pjt||  is  uniformly 
bounded,  for  each  6  there  exists  a  I\  such  that  \\0kCkPk\\  <  6  for  all  k  >  K .  | 

If  an  infinite  sequence  {ar/t}fcT0  ‘s  generated,  the  following  lemma  shows  that 
there  are  only  a  finite  number  of  iterates  where  a  direction  of  negative  curvature  is 
computed. 

Lemma  6.5.  For  any  sequence  there  must  exist  a  finite  K  such  that  dk  =  0 

for  all  k  >  K. 

Proof.  The  sequence  is  decreasing  and  Assumptions  A1  and  A2  imply 

that  this  sequence  is  bounded  from  below,  and  it  follows  that  {/(z*)}^o  converges 
to  a  limit  /.  Assume  that  there  exists  an  infinite  subsequence  such  that 

dk  ^  0  for  all  k  E  J ■  From  the  equation 

OO 

/  -  f{x  o)  =  -  /(**)) 

k=0 

and  the  fact  that  each  term  in  this  sum  is  nonpositive,  it  follows  that 

/  -  /(* o)  <  5Z(/(xfc+i)  -  f{Xk))- 
keJ 


From  Lemmas  5.2  and  5.5  we  obtain 

T 


ptffkPk  <  dTkHkdk  <  - 


for  all  k  E  J ■  The  inequalities  gkpk  <  0  and  >  ami„  imply  that 


f  -  f(x o)  <  Y,  ~ 

k£j 


2  ri 
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Since  /  is  finite  this  inequality  must  imply  that  ik  — >  oc  as  A:  — »  oo,  for  k  £  J. 
Further,  from  the  definition  of  4 


/(z/t+71*  >  /(*fc)  +  tn'k  lak<j7kpk  + 


fi2J2{,k  i]ak  i 


2 


PkHkPk- 


The  Taylor-scries  expansion  yields 

rk{Xk,Pk ,Yk~lc*k)  .  (1-/0  T  (l-/*2) 


>  - 


72(,fc  'Iq|  7**  lak^k^k  2 

Using  the  fact  that  <  0,  it  follows  from  Lemmas  5.2  and  5.5  that 

rk(xk,Pk, l'k~lak)  _  (1  -  /‘2)(1  ~  >l)t2f>nun  ,, 

- > - 2ij - '  ,<>'4) 

Taking  the  limit  in  (6.4)  noting  that  a*.  <  amAX  it  follows  from  Lemma  6.4  that 


PkHkPk- 


0  >  (  1  —  P2 )(  1  — 

2V 

which  is  a  contradiction.  Therefore,  there  exists  a  finite  A  such  that  dk  is  zero  for 
all  k  >  K.  | 


7.  Global  Convergence  Properties 


Using  the  established  lemmas  we  can  derive  the  following  theorem  concerning  the 
limit  points  of  the  sequence  {xk}fLu- 


Theorem  7.1.  If  an  infinite  sequence  *s  generated  as  defined  in  (6.2),  any 

limit  point  x  satisfies 


V/(.r)  =  0  and  Amin(V2/(*))  > 
where  h  =  inaxfrn ix,{(V2/(;r))„},  Amjn} 


ne2h 

V 


Proof.  Without  loss  of  generality,  it  may  be  assumed  that  the  sequence  {-ft- }(40 
converges  to  some  point  x.  Lemma  6.5  implies  that  there  exists  a  l\  such  that 
dk  =  0  for  all  k  >  A,  so  that  pk  =  .s/,.  for  k  >  A  .  Therefore,  Lemma  5.4  and  the 
continuity  of  V2}  imply  that 

Ami,i(V2/(*))>  . 

Assume  that  there  exists  an  I  such  that  ik  <  /  for  all  k  >  A\  It  follows  from 
Lemma  5.1  that 


f{Xk+\)  -  f(*k)  <  tn't'kUk'k  <  ~lll 1  r\ Oniinll/Z/tii2- 


7.  Global  Convergence  Properties 
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Since  /(x)  is  finite,  it  follows  that  V/(x)  =  0. 

If  the  integers  4  are  not  bounded  above,  then  it  may  be  assumed  without  loss 
of  generality  that  4  — ♦  oo  as  k  — ►  oo.  From  the  definition  of  4  it  follows  that 


f(xk  +  y'k  laksk)  -  f(xk)  >  py,k  1akgksk 
for  all  k  >  K .  The  Taylor-series  expansion  yields 

^slHksk  +  >  -(,  -  rtrfv 

2  Yk  Qk 

Using  Lemma  5.1  it  follows  that 

7'k~l<*k  t it  .  rk(xk,sk,y'k~1ak)  _  ,,  ,  „  ll2 

— ^ — 4uksk  +  —7— r~ - >  (1  -M)ci  W  ■ 

2  ‘Qjt 

Taking  the  limit  and  using  Lemmas  5.6  and  6.4  we  have  V/(x)  =  0  as  required. 


As  stated  in  the  following  corollary,  a  consequence  of  this  theorem  is  that  if  two 
consequent  iterates  are  identical,  a  limit  point  is  found,  since  all  subsequent  iterates 
are  identical. 


Corollary  7.1.  If  two  consecutive  iterates  xk  and  xk+l  are  identical ,  the  point  xk 
satisfies 


V/(x*)  =  0 


and 


•^min (^^f(xk)) 


nt2hk 

1 


I 


The  assumptions  made  are  not  sufficient  to  guarantee  that  the  sequence 
is  convergent.  Some  additional  conditions  are  needed  to  guarantee  that  a  generated 
sequence  has  a  unique  limit  point.  As  observed  by  More  and  Sorensen  [MS79],  if 
we  make  the  additional  assumption  that  there  are  only  a  finite  number  of  points  in 
S(xo)  where  the  gradient  vanishes,  the  following  result  may  be  used. 

Lemma  7.1.  (Ortega  and  Rheinboldt  [OR70])  Suppose  that  a  generated  se¬ 
quence  {x/t}£Lo  satisfies 

lim  (x*;+1  -  Xfe)  —  0  and  lim  V/( xk)  =  0. 

k—*oo  k— *oo 

Furthermore,  suppose  that  the  level  set  S(x o)  is  compact.  If  there  are  only  a  finite 
number  of  points  in  ^(xo)  where  the  gradient  vanishes,  then  there  exists  a  point  x 
such  that 

lim  xk  -  x  and  V/(x)  =  0. 

k— »oo 

Proof.  See  Ortega  and  Rheinboldt  [OR70,  Theorem  14.1.5].  | 

In  the  method  proposed  in  this  paper,  it  follows  from  Lemma  6.5  that  there  is 
a  K  such  that  pk  —  sk  for  k  >  A'.  From  Lemma  5.1  we  get  limjt_00Sfc  =  0.  Using 
Lemma  7.1  the  following  corollary  may  be  established. 
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Corollary  7.2.  If  there  are  only  a  finite  number  of  points  in  S(x0)  when  the  gra¬ 
dient  vanishes,  the  sequence  converges  to  a  point  x  satisfying 

V/(i)  =  0  and  Amin(V2/(:r))  >  -  — , 

V 

where  h  =  max{maxI{(V2/(i))lt}, | 

8.  Test  Problems  and  Numerical  Results 

A  Fortran  version  of  the  algorithm  was  run  on  two  types  of  test  problems:  nonlin¬ 
ear  least-squares  problems  and  barrier  problems.  The  computer  used  was  a  DEC 
VAXstation  II,  with  relative  machine  precision  cM«1.39  X  10~17. 

8.1.  Parameter  values 

Various  values  of  the  parameters  discussed  in  Section  2.2  were  investigated.  The 
results  presented  here  were  obtained  with  the  following  values: 


e 

10-6 

(specifies  smallest  acceptable  pivot  element) 

k  ■ 
“nun 

1(T3 

(smallest  acceptable  maximum  diagonal  dement  of  H\\) 

7 

10-3 

(tolerance  for  the  acceptance  of  d *) 

f*min 

10-1° 

(minimum  step  in  the  linesearch) 

«  max 

1015 

(maximum  step  in  the  linesearch) 

/' 

0.1 

(damping  factor  used  in  the  truncated  Taylor  polynomial) 

7 

0.5 

(parameter  for  the  backtracking). 

The  value  of  c  is  a  tradeoff  between  a  small  value  that  gives  the  Newton  search 
direction  when  H k  is  positive  definite,  and  a  value  large  enough  to  ensure  that  //n 
is  well-conditioned.  Theoretically,  a  very  small  value  of  c  is  preferred,  since  this  is 
more  likely  to  give  limit  points  that  satisfy  the  first-  and  second-order  necessary 
conditions  (see  Theorem  7.1).  However,  small  values  of  c  may  give  ill-conditioned 
Cholesky  factors  which  may  cause  inaccurate  search  directions.  Our  experiments 
indicate  that  the  overall  performance  of  the  method  is  not  sensitive  to  the  precise 
value  of  e. 

Given  the  value  of  c ,  hmin  is  selected  to  ensure  that  the  minimum  pivot  element 
is  always  greater  than  the  machine  precision.  In  the  experiments  presented  here, 
this  value  of  hmm  affected  only  two  iterates. 

The  value  of  T]  was  varied  by  several  orders  of  magnitude  from  the  chosen  value, 
without  changing  the  overall  performance.  The  value  selected  helps  to  avoid  com¬ 
puting  directions  of  negative  curvature  when  the  elements  of  the  Schur  complement 
are  all  small  in  magnitude. 

The  steplength  a/.-  is  computed  using  the  linesearch  procedure  of  Gill  <t  al. 
[GMSW79]  with  default  parameter  settings.  At  each  step  of  the  linesearch  both  the 
function  and  gradient  are  evaluated.  The  value  of  /{  above  was  chosen  to  ensure  that 
i/fc  =  0  is  accepted  in  most  cases.  Since  the  value  of  i’jt  in  (6.2)  differed  from  zero 
in  only  two  cases,  we  deduce  that  the  choice  of  7  is  not  crucial.  The  values  of 
and  Omax  were  designed  to  ensure  that  the  steplength  produced  bv  iiu  nnesearen 
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is  accepted  in  almost  all  cases.  In  practice,  a  sensible  choice  of  amax  can  improve 
efficiency. 

The  efficiency  of  the  linesearch  is  affected  by  the  initial  estimate  of  a.  Whenever 
dk  was  zero,  the  choice  of  a  =  1  was  found  to  be  adequate.  However,  the  unit  step 
tended  to  overestimate  the  accepted  step  when  dk  was  nonzero.  To  allow  for  this, 
an  initial  step  of  0.01  was  used  in  these  cases. 


8.2.  Least-squares  test  problems 

The  least-squares  test  problems  comprise  a  suite  of  45  problems,  given  by  Fra¬ 
ley  [Fra88].  Many  of  these  problems  are  known  to  be  hard  to  solve,  in  spite  of  their 
small  size.  A  summary  of  results  obtained  on  these  problems  when  applying  dif¬ 
ferent  least-squares  methods  and  methods  for  unconstrained  minimization  is  given 
in  Fraley  [Fra88].  Our  numbering  of  the  problems  is  the  same  as  in  Fraley’s  study. 
The  formulations  for  problems  l-35c  are  given  by  More  et  al.  [MGH81],  problems 
36a-36d  are  presented  in  Fraley  [Fra88],  problems  37-38  are  given  by  Salane  [Sal87], 
problems  39a-41<7  are  from  McKeown  [McK75],  problems  42a-43/  originate  from  de 
Villiers  and  Glasser  [dVG8l]  and  problems  44a-45e  are  from  Dennis  et  al.  [DGV85]. 

We  accept  as  a  solution  of  a  least-squares  problem  if  one  of  the  following  two 
conditions  are  met: 


Cl. 


dk  = 

INI  < 


0 


or 


C2. 


dk 

f(Xk-x)-f(Xk) 

IN  -  **- ill 
INI 


=  o 

< 

< 

< 


«m(i  +  l/(*k)l) 

^(i  +  INII) 

^(i  +  l/(xfc)D- 

The  first  condition  is  intended  to  accept  points  that  approximately  satisfy  the 
first-  and  second-order  necessary  conditions  for  optimality.  The  second  condition  is 
intended  to  test  when  the  sequence  {zfcjjfcLo  has  converged.  For  a  detailed  discussion 
of  convergence  criteria  for  unconstrained  optimization,  see  Gill  et  al.  [GMW81, 
Chapter  8]. 

In  some  problems  it  was  not  possible  to  evaluate  the  function  at  all  tual  points. 
In  these  cases,  the  trial  step  was  repeatedly  decreased  by  a  factor  7  (7  =  0.5)  until  / 
could  be  evaluated.  This  additional  backtracking  was  necessary  for  problems  42a  and 
43d  because  of  an  implicit  nonnegativity  constraint  on  one  variable;  and  for  problem 
19  because  of  overflow  during  the  calculation  of  the  objective  function.  Similarly, 
the  initial  step  at  the  starting  point  of  problem  11  was  repeatedly  decreased  until 
the  Hessian  and  gradient  were  not  numerically  zero.  These  trial  function  evaluations 
are  included  in  the  number  of  function  evaluations  shown. 

In  problems  2,  36a,  366  and  36d,  the  algorithm  failed  to  converge  within  the 
permitted  number  of  iterations.  In  all  cases,  this  non- convergence  is  a  consequence 
of  the  Hessian  being  very  ill-conditioned  at  the  solution.  Although  the  algorithm  did 
not  converge  in  these  cases,  the  objective  value  was  close  to  the  optimal  objective 
value. 
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nr 

name 

n 

"1 

h 

IM 

*(H„) 

k 

nf 

coni' 

1 

rose 

2 

2 

6.317050E-32 

6.0E-1S 

2.E+03 

22 

29 

i 

0 

0 

0 

2 

froth 

2 

2 

2.449213E+01 

1 . 7E-07 

1 . E+03 

7 

11 

i 

2 

0 

0 

3 

pOO lb 8 

2 

1 

2.837351E-0S 

1 . 1E+01 

1 . E+00 

600 

2079 

n 

5 

593 

1 

4 

brownba 

2 

1 

3.851860E-34 

2.8E-11 

1 . E+00 

4 

5 

7 

1 

3 

0 

5 

be  ale 

2 

2 

1 .007290E-23 

3.8E-12 

1 . E+02 

8 

18 

T 

0 

3 

3 

6 

jensan 

2 

2 

6.218109E+01 

2.0E-13 

8 . E+00 

10 

11 

T 

0 

0 

0 

7 

helix 

3 

3 

2.943716E-3S 

2.SE-17 

3 . E+02 

14 

25 

y 

0 

S 

5 

8 

bard 

3 

3 

4 . 107439E-03 

4.4E-16 

2. E+03 

14 

21 

y 

0 

1 

1 

9 

gauss 

3 

3 

5 .639664 E -09 

4.9E-11 

5 . E+01 

2 

3 

y 

0 

0 

0 

10 

■eyer 

3 

2 

2.661418E+04 

4.1E+01 

6 . E+07 

24 

74 

n 

4 

23 

2 

11 

giiif 

3 

3 

8.612303E-20 

2  OE-IO 

l.E+10 

151 

251 

y 

0 

8 

7 

12 

box 

3 

3 

8.939108E-30 

1  3E-15 

7 . E+03 

14 

19 

y 

0 

1 

0 

13 

sing 

4 

4 

1 . 300556E-13 

1  8E-09 

1 . E+08 

21 

22 

y 

0 

0 

0 

14 

wood 

4 

4 

O.OOOOOOE+OO 

O.OE+OO 

5. E+02 

39 

52 

y 

0 

1 

1 

15 

kovosb 

4 

4 

1  537528E-04 

3.6E-11 

2 . E+03 

9 

23 

y 

0 

4 

4 

16 

bro  wnclen 

4 

4 

4 . 291110E+04 

1 .6E-10 

6. E+01 

8 

9 

y 

0 

0 

0 

17 

osbl 

5 

5 

2.732447E-05 

3.6E-09 

1  E+09 

65 

147 

y 

0 

28 

28 

18 

exp6 

6 

5 

2  827825E-03 

1  .9E-09 

1 .  E+OS 

48 

136 

y 

1 

46 

37 

19 

osb2 

11 

11 

2.006887E-02 

2.2E-12 

4 . E+03 

16 

37 

y 

0 

6 

6 

20a 

watson06 

6 

6 

1.14383SE-03 

S.2E-13 

2.E+04 

12 

13 

y 

0 

0 

0 

20b 

watson09 

9 

9 

6.998801E-07 

7.SE-1S 

2. E+08 

13 

14 

y 

0 

0 

0 

20c 

watsonl2 

12 

11 

4 . 178499E-09 

6.4E-08 

8.  E+09 

31 

38 

y 

3 

32 

1 

20d 

watson20 

20 

13 

6.886510E-08 

1  8E-08 

2.E+11 

53 

107 

y 

3 

54 

0 

21a 

rosex 

10 

10 

3.158S2SE-31 

1.3E-14 

2 .  E+03 

22 

29 

y 

0 

0 

0 

21b 

rosex2 

20 

20 

6.317050E-31 

1.9E-14 

2 . E+03 

22 

29 

y 

0 

0 

0 

22a 

singx 

12 

12 

3.901678E-13 

3.2E-09 

l.E+08 

21 

22 

y 

0 

0 

0 

22b 

singx2 

20 

20 

1.284503E-13 

1 .2E-09 

2. E+08 

22 

23 

y 

0 

0 

0 

23a 

peni4 

4 

4 

1  124989E-05 

7.5E-11 

5. E+03 

34 

43 

y 

0 

0 

0 

23b 

penilO 

10 

10 

3.543826E-05 

1  3E-12 

1 .E+03 

36 

44 

y 

0 

0 

0 

24a 

penii4 

4 

4 

4.688147E-06 

1  IE-10 

2 . E+06 

110 

158 

y 

0 

0 

0 

24b 

peniilO 

10 

10 

1 . 468303E-04 

1.0E-09 

2.E+06 

93 

132 

y 

0 

0 

0 

25a 

▼ardial 

10 

10 

8 . 680345E-27 

2.6E-12 

1 .E+02 

14 

15 

y 

0 

0 

0 

25b 

vardim2 

20 

20 

O.OOOOOOE+OO 

O.OE+OO 

4  .  E+02 

18 

19 

y 

0 

0 

0 

28a 

trig 

10 

10 

1.721941E-24 

1  3E-12 

8. E+00 

7 

11 

y 

0 

1 

1 

26b 

trig2 

20 

20 

3 . 074585E-28 

1  2E-14 

4 .E+00 

11 

22 

y 

0 

5 

5 

27a 

brownall 

10 

10 

2 . 651544E-28 

2.3E-14 

2. E+03 

8 

9 

y 

0 

0 

0 

27b 

brownal2 

20 

20 

2 . 462302E-18 

3.3E-09 

1 . E+04 

9 

10 

y 

0 

0 

n 

28a 

discbvl 

10 

10 

9 . 287387E-25 

1.7E-13 

9. E+01 

3 

4 

y 

0 

0 

0 

28b 

discbv2 

20 

20 

1 . 182787E-25 

1 .7E-14 

7. E+02 

3 

4 

y 

0 

0 

0 

29a 

disciel 

10 

10 

1.997048E-22 

2.5E-11 

1 .E+00 

3 

4 

y 

0 

0 

0 

29b 

discie2 

20 

20 

3 . 293268E-22 

3.2E-11 

1 .E+00 

3 

4 

y 

0 

0 

0 

30a 

broytril 

10 

10 

8.955574E-33 

7.6E-16 

2. E+00 

6 

7 

y 

0 

0 

0 

30b 

broytri2 

20 

20 

2.051115E-32 

1 . IE- 15 

2. E+00 

6 

7 

y 

0 

0 

0 

31a 

broybanl 

10 

10 

6.032100E-27 

S.2E-13 

3. E+00 

8 

9 

y 

0 

0 

0 

31b 

broyban2 

20 

20 

6 . 067580E-27 

5.2E-13 

3. E+00 

8 

9 

y 

0 

0 

0 

32 

lin 

10 

10 

5 . OOOOOOE+OO 

9.7E-16 

1 .E+00 

1 

2 

y 

0 

0 

0 

33 

linl 

10 

1 

2.317073E+00 

2  8E-11 

1 .E+00 

1 

3 

y 

1 

2 

0 

34 

linO 

10 

1 

3 . 067568E+00 

4.0E-11 

1  E+00 

1 

3 

y 

1 

2 

0 

35a 

chebjqul 

8 

8 

1 .7S8437E-03 

5  4E-15 

2 .  E+01 

19 

34 

y 

0 

11 

11 

Table  8.1:  Results  for  least-squares  test  problems  1  35 n. 
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nr 

name 

n 

ni 

Jk 

1MI 

K(W)l) 

k 

CO  Tl  V 

#nj 

#du 

35b 

chebyqu2 

9 

9 

9.668790E-22 

8.SE-11 

2.E+02 

34 

84 

7 

0 

27 

27 

35c 

chebyqu3 

10 

10 

3.251977E-03 

6.4E-11 

2.E+02 

24 

46 

7 

0 

16 

16 

36a 

msqrt 1 i 

4 

4 

7.839519E-11 

5 .5E-06 

6.E+10 

600 

885 

n 

5 

0 

0 

36b 

nsqrt2i 

9 

7 

2.066297E-09 

1 . 8E-05 

2 .  E+08 

600 

2175 

n 

5 

421 

361 

36c 

maqrt3i 

9 

8 

6.499547E-16 

2.1E-08 

4 . E+08 

28 

44 

n 

4 

2 

1 

36d 

■sqrt4i 

9 

9 

2.10516SE-09 

1  .6E-06 

S.E+10 

600 

2154 

n 

5 

415 

361 

37 

haul 

2 

2 

1.043S01E+02 

1 .SE-12 

3.E+04 

5 

9 

7 

0 

1 

0 

38 

han2 

3 

3 

1 .983216E+01 

2.7E-10 

1 . E+06 

6 

11 

7 

0 

0 

0 

39a 

■ckla 

2 

2 

9.180060E-02 

1 .0E-17 

7.E+00 

3 

4 

7 

0 

0 

0 

39b 

»cklb 

2 

2 

9.180060E-02 

7.7E-12 

5.E+00 

3 

4 

7 

0 

0 

0 

39c 

■cklc 

2 

2 

9.180060E-02 

1 -3E-13 

2.E+00 

3 

4 

y 

0 

0 

0 

39d 

sckld 

2 

2 

9.180060E-02 

2.2E-17 

2 . E+00 

5 

6 

7 

0 

0 

0 

39« 

■ckle 

2 

2 

9.180060E-02 

1  OE-14 

6 . E+00 

7 

8 

7 

0 

0 

0 

39f 

mcklf 

2 

2 

9.180060E-02 

3.9E-18 

7. E+00 

10 

11 

7 

0 

0 

0 

39g 

■cklg 

2 

2 

9.180060E-02 

8.8E-10 

7. E+00 

12 

13 

T 

0 

0 

0 

40a 

■ck2a 

3 

3 

3.982776E-01 

4.2E-1S 

2 . E+Ol 

3 

4 

7 

0 

0 

0 

40b 

mck2b 

3 

3 

3.982776E-01 

1 .2E-10 

8 . E+00 

3 

4 

7 

0 

0 

0 

40c 

ack2c 

3 

3 

3.982776E-01 

6.6E-13 

2.E+00 

4 

5 

7 

0 

0 

0 

40d 

■ck2d 

3 

3 

3.982776E-01 

1 .OE-16 

4 . E+00 

5 

6 

7 

0 

0 

0 

40« 

mck2e 

3 

3 

3.982776E-01 

6.7E-17 

1 . E+01 

7 

8 

7 

0 

0 

0 

4  Of 

ack2f 

3 

3 

3.982776E-01 

5.4E-12 

2.E+01 

9 

10 

7 

0 

0 

0 

40g 

Bck2g 

3 

3 

3.982776E-01 

1  .4E-1S 

2 . E+01 

12 

13 

7 

0 

0 

0 

4ta 

Bck3a 

5 

S 

5  000001E-01 

8.3E-10 

4 . E+OO 

2 

3 

7 

0 

0 

0 

41b 

ack3b 

5 

5 

S.OOOOOIE-Ol 

6.7E-14 

3 . E+00 

3 

4 

7 

0 

0 

0 

41c 

Bck3c 

5 

5 

5. 000001 E-01 

4.8E-12 

3 . E+00 

7 

8 

7 

0 

0 

0 

41d 

ack3d 

5 

5 

5. 000001 E-01 

9.6E-15 

2 .  E+00 

8 

9 

7 

0 

0 

0 

41e 

Bck3e 

5 

5 

5. 000001 E-01 

3.7E  10 

2 . E+00 

10 

11 

7 

0 

0 

0 

41f 

ack3f 

5 

5 

5.000001E-01 

1.7E-11 

3. E+00 

13 

14 

7 

0 

0 

0 

«g 

Bck3g 

5 

5 

5.000001E-01 

2 . IE-12 

3. E+00 

16 

17 

7 

0 

0 

0 

43a 

devgla 

4 

4 

3.S93754E-28 

7.3E-12 

5 .  E+04 

16 

27 

7 

0 

2 

2 

42b 

devglb 

4 

4 

2.485558E-23 

1 .2E-09 

5  .  E+04 

28 

51 

7 

0 

6 

6 

42c 

d«»glc 

4 

4 

2.223602E-28 

S.3E-12 

5 . E+04 

21 

43 

7 

0 

5 

5 

4  2d 

daagld 

4 

4 

1 .910276E-28 

7  0E-12 

5 . E+04 

19 

26 

7 

0 

2 

2 

43a 

de»g2a 

5 

5 

1 .390367E-29 

1 .4E-12 

8 . E+06 

17 

26 

7 

0 

3 

3 

43b 

davg2b 

5 

S 

1 .352306E-25 

9.4E-11 

8 . E+06 

16 

29 

7 

0 

4 

4 

43c 

davg2c 

S 

5 

5.445605E-29 

5.4E-12 

8 . E+06 

13 

27 

7 

0 

6 

5 

43d 

de»g2d 

5 

5 

9 . 207747E-22 

2  .  IE-09 

8 . E+06 

29 

50 

7 

0 

5 

4 

43e 

devg2e 

5 

5 

1 .059680E-21 

2.8E-09 

8 . E+06 

17 

30 

7 

0 

5 

5 

43f 

davg2f 

5 

5 

3.254051E-30 

2  8E-13 

8 . E+06 

18 

32 

7 

0 

4 

4 

44a 

dgv6a 

6 

6 

3.982829E-24 

2.3E-11 

7 . E+06 

38 

121 

7 

0 

29 

29 

44b 

dgv6b 

6 

6 

1 .255706E-31 

1 .6E-14 

4 . E+02 

12 

26 

7 

0 

4 

4 

44c 

dgv6c 

6 

6 

8.742151E-25 

8.2E-10 

4.E+11 

392 

833 

7 

0 

385 

83 

44d 

dgv6d 

6 

6 

3.416587E-26 

1  2E-10 

2.E+10 

316 

764 

7 

0 

306 

126 

44e 

dg+6e 

6 

6 

1 . 306575E-30 

1  .IE-12 

1 . E+08 

175 

516 

7 

0 

164 

163 

45a 

dga8a 

8 

8 

5 . 542109E-26 

1 .4E-11 

7 . E+06 

39 

116 

7 

0 

31 

31 

45b 

dg+8b 

8 

8 

3.710801E-33 

6.7E-16 

2 . E+03 

16 

36 

7 

0 

7 

7 

45c 

dg»8c 

8 

8 

1  234906E-30 

1  3E-11 

9.E+11 

484 

1003 

7 

0 

480 

134 

45d 

dgv8d 

8 

8 

1  970398E-30 

3.4E-12 

4 . E+10 

480 

1053 

7 

0 

470 

174 

45a 

dgv8e 

8 

8 

3.968786E-31 

6.0E-13 

4 . E+08 

349 

953 

7 

0 

339 

338 

Table  8.2:  Results  for  least-squares  test  problems  35fr-45e. 
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nr 

name 

ltr1 

ic r2 

1  (T3 

ler* 

nr 

name 

ler1 

nr 2 

nr* 

nr* 

1 

rose 

5 

12 

16 

17 

35b 

chebyqu2 

20 

29 

29 

31 

2 

froth 

2 

3 

4 

4 

35c 

chebyqu3 

n 

14 

19 

21 

3 

poelbs 

1 

4 

a 

235 

36a 

nsqrtl i 

2 

4 

6 

10 

4 

brownbs 

3 

3 

3 

3 

36b 

■sqrt2i 

2 

4 

6 

10 

5 

beale 

2 

3 

4 

5 

36c 

msqrt3i 

2 

3 

5 

7 

6 

jensan 

2 

3 

5 

6 

36d 

msqrt4i 

2 

4 

6 

10 

7 

helix 

4 

6 

7 

8 

37 

haul 

1 

2 

2 

3 

8 

bard 

2 

4 

6 

8 

38 

han2 

1 

2 

3 

3 

9 

gauss 

1 

1 

1 

1 

39  a 

mckl  a 

1 

1 

1 

1 

10 

meyer 

1 

2 

2 

3 

39b 

mcklb 

1 

1 

1 

2 

11 

gulf 

1 

1 

3 

6 

39c 

mcklc 

1 

1 

1 

2 

12 

box 

2 

3 

5 

7 

39d 

mckld 

2 

2 

3 

3 

13 

sing 

2 

3 

5 

6 

39e 

nckle 

2 

3 

4 

5 

14 

BOOd 

1 

3 

4 

26 

39f 

BCklf 

2 

3 

5 

6 

15 

kouosb 

3 

5 

6 

7 

39g 

BCklg 

2 

3 

5 

6 

16 

brownden 

2 

4 

5 

5 

40  a 

mck2a 

1 

1 

1 

1 

17 

osbl 

12 

28 

32 

37 

40b 

mck2b 

1 

1 

2 

2 

18 

exp6 

10 

2o 

27 

32 

40c 

ack2c 

1 

2 

2 

2 

19 

osb2 

7 

9 

11 

12 

40d 

Bck2d 

1 

2 

2 

3 

20a 

watson06 

1 

2 

5 

7 

40e 

Bck2e 

2 

3 

4 

4 

20b 

watson09 

1 

2 

5 

8 

40f 

Bck2f 

2 

3 

5 

6 

20c 

watsonl2 

1 

3 

4 

6 

40g 

mck2g 

2 

3 

5 

6 

20d 

watson20 

4 

7 

12 

17 

41a 

nck3a 

1 

1 

1 

1 

21a 

rosex 

S 

12 

16 

17 

41b 

mck3b 

1 

1 

1 

1 

21b 

rosex2 

S 

12 

16 

17 

41c 

mck3c 

2 

3 

4 

5 

22a 

singx 

2 

3 

5 

6 

41d 

mck3d 

1 

3 

4 

5 

22b 

singx2 

2 

3 

5 

6 

41  e 

ack3e 

2 

4 

5 

6 

23a 

peni4 

2 

3 

5 

6 

41  f 

ack3f 

2 

3 

5 

6 

23b 

penilO 

2 

3 

5 

6 

4lg 

Bck3g 

2 

3 

5 

6 

24a 

penii4 

2 

2 

3 

4 

42a 

d erg la 

8 

10 

12 

12 

24b 

peniilO 

2 

3 

4 

5 

42b 

devglb 

20 

23 

24 

25 

25a 

vardiml 

2 

*> 

5 

6 

42c 

devglc 

14 

16 

17 

18 

25b 

vardim2 

2 

3 

5 

6 

42d 

devgld 

11 

14 

14 

15 

26a 

trig 

3 

4 

4 

5 

43a 

devg2a 

2 

3 

5 

6 

26b 

trig2 

6 

7 

8 

8 

43b 

de vg2b 

3 

5 

7 

9 

27a 

brounall 

1 

1 

1 

1 

43c 

devg2c 

1 

4 

7 

7 

27b 

bro»nal2 

1 

1 

1 

1 

43d 

devg2d 

4 

6 

8 

10 

28a 

discbvl 

1 

1 

1 

2 

43e 

devg2e 

2 

4 

7 

9 

28b 

discb*2 

1 

1 

1 

2 

43f 

devg2f 

3 

6 

8 

10 

29a 

disciel 

1 

1 

2 

2 

44a 

dgv6a 

11 

22 

JO 

33 

29b 

discie2 

1 

1 

2 

2 

44b 

dgv6b 

3 

6 

8 

9 

30a 

broytril 

1 

2 

3 

3 

44c 

dgv6c 

1 

5 

29 

119 

30b 

broytri2 

1 

2 

3 

3 

44d 

dgv6d 

1 

3 

24 

97 

31a 

broybanl 

2 

3 

4 

5 

44e 

dgv6e 

1 

3 

14 

52 

31b 

broyban2 

2 

3 

4 

5 

45a 

dgv8a 

11 

22 

29 

33 

32 

lin 

1 

1 

1 

1 

45b 

dgv8b 

2 

5 

9 

12 

33 

lini 

1 

1 

1 

1 

45c 

dgv8c 

6 

9 

27 

99 

34 

linO 

1 

A 

; 

1 

45d 

dgv8d 

3 

6 

20 

89 

35a 

chebyqul 

10 

15 

is 

16 

45e 

dgv8e 

3 

4 

14 

57 

fable 

S.3:  Nil 

mber 

of  iterations 

required 

to  reduce 

(/(•nt)  - 

f(S 

))/(/( 

•TO  )  “  . 

below  four  different  tolerances. 
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Ill-conditioning  was  also  responsible  for  the  failure  in  problems  10  and  36c.  In 
these  cases,  the  algorithm  terminated  because  of  a  failure  in  the  linesearch.  Again, 
the  objective  value  has  been  reduced  significantly.  In  problem  36c  the  algorithm 
terminated  at  a  point  very  close  to  the  solution.  In  problem  10  the  Hessian  at  the 
final  iterate  is  positive  definite  but  very  ill-conditioned. 

The  results  of  the  computer  runs  are  summarized  in  Tables  8.1  and  8.2.  The 
column  headings  have  the  following  meaning: 
nr  Problem  number. 

name  Problem  name. 

n  Number  of  variables. 

Dimension  of  Hu  at  the  fined  iterate  x*. 

/*  Value  of  the  objective  function  the  final  iterate  X*. 

Hi/* ||  Norm  of  the  gradient  at  the  final  iterate  xjt. 

k(//„)  Estimate  of  the  condition  number  of  the  final  Hi j. 


k  Number  of  iterations. 

nf  Number  of  function  evaluations. 

conv  Convergence  information. 

y  0  Convergence  criteria  Cl  satisfied  with  n2  =  0. 

y  1  Convergence  criteria  Cl  satisfied  with  n2  >  0. 

y  2  Convergence  criteria  C2  satisfied  with  n2  =  0. 

y  3  Convergence  criteria  C2  satisfied  with  n2  >  0. 

n  4  Nonconvergent  due  to  failure  in  linesearch. 
n  5  Nonconvergent  due  to  too  many  iterations  (>  600). 
#n2  Number  of  iterates  where  n2  was  positive. 

#du  Number  of  iterates  where  d  was  used. 


Our  experience  from  working  on  these  problems  is  th;.t  it  is  possible  to  reduce 
the  value  of  the  objective  function  significantly  in  a  relatively  small  number  of 
iterations,  as  illustrated  in  Table  8.3.  However,  stringent  convergence  criteria  such 
as  those  used  here  may  not  always  be  achievable  if  the  Hessian  is  iil-conditioned  at 
the  solution. 


8.3.  Barrier  test  problems 

The  test  problems  with  a  general  objective  form  originate  from  the  barrier  function 
approach  of  Resende  et  al.  [RKR89]  for  solving  0-1  integer  programming  problems. 
The  aim  of  this  approach  is  to  find  a  point  x*  with  all  components  ±1  in  the  set  F, 
where  F  is  defined  to  be 


->) 

1  2 b-  Ae  +  e  ) 

x  : 

x  < 

>. 

i) 

\  e  ) 

for  an  m  X  n  matrix  A  and  an  n-vector  b.  We  consider  the  case  where  all  elements 
of  A  and  6  are  integers.  The  vector  e  denotes  a  suitably  dimensioned  vector  with 
unit  components. 

If  the  composite  matrix  and  vector  associated  with  the  inequalities  of  (8.1)  are 
denoted  by  A  and  6,  we  may  write  F  —  {x  :  Ax  <  6}. 
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This  integer  feasibility  problem  is  converted  into  a  smooth  minimization  problem. 
The  function  to  be  minimized  is  the  barrier  function  /  defined  by 

i  .  m+2n 

f(x)  =  -ln(n  -  xTx) - -  ln(e,T(6  -  Ax)) 

i  m  +  lu  — f 

*=i 

(see  Resende  et  al.  [RKR89]).  This  barrier  function  does  not  satisfy  the  assumptions 
of  Section  2.1,  since  the  function  is  only  defined  for  x  such  that  Ax  <  b.  Moreover,  as 
is  shown  in  the  appendix,  the  barrier  function  tends  to  minus  infinity  for  a  sequence 
converging  to  a  point  with  all  components  ±1.  Nevertheless,  these  functions  are 
useful  as  test  problems  because  they  have  many  local  minimizers  and  exhibit  many 
directions  of  negative  curvature.  (Moreover,  it  was  also  of  interest  to  see  if  the 
algorithm  was  able  to  locate  a  point  in  F  with  all  components  ±1.) 

Three  different  test  problems  were  used,  and  for  each  of  them  the  set  of  points 
in  F  with  all  components  ±1  consists  of  only  one  point,  x* . 

Data  for  barrier  test  problem  1: 


(  -2  -1 

-1 

0 

0 

0  > 

(  _1 

\ 

-1  0 

0 

—  2  - 

■1 

0 

-2 

A  = 

0  -1 

0 

-1 

0 

-1 

,  b  = 

-2 

0  0 

-2 

0  - 

1 

-1 

-1 

\  3  2 

3 

4 

2 

3  ! 

l  8 

a) 

*0  = 

(  -0.90 

0.76 

-0.76 

0.64 

0.20 

o 

M 

o’ 

1 

b) 

1 0  = 

(  -0.86 

0.64 

-0.64 

0.46 

-0.20 

0.20 

)T 

**  =  (  • 

-1 

1  -1 

1  1  -1  )' 

Data  for  barrier 

test  problem  2: 

A=(  ' 

2 

4 

3 

\  . 

(  5 

\ 

)  »  ^ 

= 

, 

\  -4 

-3 

-4  - 

-2 

J 

V  -8 

J 

a) 

xo  =  ( 

0.90 

-0.10 

0.45 

\T 

-0.95  )  , 

b) 

i-o  =  ( 

0.88 

0.08 

0.34 

-0.94  )  , 

* 

X 

=  ( 

1  -I 

1  -1 

■  )r 

Data  for  barrier  test  problem  3: 


(  4  8 

2 

4  \ 

(  U 

A  =  1  2  4 

4 

8  ), 

b  =  13 

\  -4  -8  ■ 

i 

-  / 

V  -9 

a) 

xo  =  (  -0.40 

0.80 

0.20 

-0.99  )T 

b) 

Xo  =  (  -0.34 

0.78 

0.12 

-0.99  )r 

*’  =  (  -1 

1 

1  - 

-■  )T 
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nr 

name 

n 

"I 

h 

11**11 

*(«») 

k 

»/ 

conv 

#n+ 

*du 

46* 

barlogla 

6 

5 

-1.841628E+O0 

1 . 3E+07 

3.E*O0 

18 

22 

j 

6 

19 

19 

46b 

barloglb 

6 

6 

7.626996E-01 

2.8E-12 

3.E+01 

7 

11 

T 

0 

3 

3 

47 a 

barlog2a 

4 

3 

-1  122621E+00 

9.0E+O6 

2.E*00 

16 

17 

7 

6 

17 

17 

47b 

barloglb 

4 

4 

S. 80571 SE-01 

3.SE-14 

l.E+01 

7 

8 

J 

0 

1 

1 

46* 

barlog3a 

4 

3 

-1.996615E+00 

6.0E+06 

3.E+00 

16 

17 

1 

6 

17 

17 

48b 

barlog3b 

4 

4 

1 . 433882E-01 

8.9E-15 

8.E+O0 

10 

14 

r 

0 

2 

2 

49* 

bar  la 

6 

5 

1  618634E-01 

2.3E+06 

3.E+00 

17 

21 

i 

6 

18 

18 

49b 

barlb 

6 

6 

2.1440S6E+00 

7.0E-12 

3.E+01 

7 

11 

T 

0 

3 

3 

50a 

bar2a 

4 

3 

3.771148E-01 

8  2E+05 

2.E+00 

14 

15 

J 

6 

15 

15 

50b 

barlb 

4 

4 

1 .7870S9E+00 

4.8E-13 

1.1*01 

7 

8 

1 

0 

1 

1 

51a 

bar3a 

4 

3 

1.43S501E-01 

1 .1E+06 

2.E+00 

15 

16 

J 

6 

16 

16 

51b 

bar3b 

4 

4 

1 .1S4178E+00 

1.1E-11 

8.E+00 

10 

13 

1 

0 

3 

3 

Table  8.4:  Results  for  barrier  test  problems 

For  each  of  the  three  test  problems,  two  starting  points  were  used.  Problems 
46a,  47a  and  48a  correspond  to  starting  points  for  which  the  sequence  {xjfc}£i.0  con¬ 
verged  to  x* .  Problems  466,476  and  486  correspond  to  starting  points  for  which  the 
sequence  converged  to  a  local  minimizer  of  the  barrier  function. 

Problems  49,  50  and  51  are  similar  to  problems  46,  47  and  48,  except  that  the 
objective  function  is  the  argum'  it  of  the  logarithmic  barrier  function,  i.e., 

_ jlj 

/(X)  =  (ir=l2n  ej(6  -  y4x))1/(m+2n> 

For  the  same  starting  points,  the  same  final  points  were  reached  in  approximately 
the  same  number  of  iterations. 

The  barrier  problems  are  not  truly  unconstrained,  since  the  objective  function 
is  only  defined  for  x  such  that  Ax  <  6.  To  allow  for  this,  the  iteration  was  modified 
so  that  Qjn*x  was  made  subject  to  being  no  greater  than  99.99%  of  the  step  to  the 
boundary  of  F.  If  d k  was  zero,  the  unit  initial  steplength  was  chosen,  and  when  <4 
was  nonzero,  a  trial  value  of  0.8amax  was  used.  The  trial  step  was  accepted  if  the 
directional  derivative  was  still  negative.  Otherwise,  the  same  linesearch  as  used  for 
the  least-squares  test  problems  was  used. 

The  following  criterion  was  used  to  decide  when  a  point  in  F  with  all  components 
±1  had  been  reached, 

C3.  max,  jl  -  \ejxk\}  < 

The  results  for  the  barrier  test  problems  are  presented  in  Table  8.4.  The  column 
headings  are  the  same  as  for  the  least-squares  test  problem,  and  the  only  difference 
is  that  convergence  criteria  C3  is  denoted  by  “y  6”  in  the  conv  column. 

8.4.  Practical  behaviour  of  the  computed  directions 

In  Section  5  of  this  paper  theoretical  properties  of  the  computed  directions  st t,  <4  and 
Pk  are  established.  It  is  shown  in  Lemma  5.3  that  the  ratio  between  the  curvature 
along  the  direction  of  negative  curvature,  <4,  and  the  smallest  eigenvalue  of  Hk  is 
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uniformly  bounded  away  from  zero.  Lemmas  5.5  and  5.6  imply  that  whenever  d* 
is  nonzero,  the  ratio  between  the  curvature  along  p <.  and  the  smallest  eigenvalue  of 
Hk  is  also  uniformly  bounded  away  from  zero. 

In  order  to  measure  the  magnitude  of  the  curvature  along  dk,  it  is  compared 
to  the  smallest  eigenvalue  of  Hk-  Figure  8.1  shows  the  ratio  between  the  curvature 
along  dk  and  the  smallest  eigenvalue  of  Hk  for  those  iterates  of  the  least-squares 
problems  where  dk  was  nonzero.  Figure  8.2  shows  the  corresponding  data  for  the 
barrier  problems.  If  10%  of  the  best  possible  curvature  is  regarded  as  “good”,  we 
see  that  this  “good”  curvature  is  computed  in  95%  of  the  cases  for  the  least-squares 
problems  and  in  98%  of  the  cases  for  the  barrier  problems. 

Ideally,  if  dk  is  nonzero,  pk  should  be  both  a  nontrivial  direction  of  negative 
curvature  and  a  descent  direction  that  is  not  too  orthogonal  to  the  negative  gradient. 
Unfortunately,  a  direction  that  simultaneously  has  both  these  properties  may  not 
exist.  In  Figures  8.3  and  8.4  we  give  the  ratio  between  the  curvature  along  pk  and 
the  curvature  along  dk  for  iterates  for  which  dk  was  nonzero.  Since  dk  is  intended  to 
be  a  good  direction  of  negative  curvature,  this  ratio  gives  an  idea  of  how  much  of  the 
best  possible  curvature  is  achieved  along  pK ■  The  data  for  the  least-squares  problems 
is  given  in  Figure  8.3;  data  for  the  barrier  problems  is  given  in  Figure  8.4.  Note  that 
for  both  classes  of  problem,  the  ratio  is  close  to  one  in  most  cases.  Moreover,  a  ratio 
greater  than  one  is  possible  if  |jp*;||  <  |[<4||.  A  ratio  greater  than  0.1  is  achieved 
in  98%  of  the  cases  for  the  least-squares  problems  and  in  99%  of  the  cases  for  the 
barrier  problems. 

The  direction  s*.  is  intended  to  be  a  good  descent  direction.  In  order  to  inves¬ 
tigate  whether  this  property  is  inherited  by  pk,  the  ratio  of  the  cosine  between  s* 
and  gk  and  the  cosine  between  pk  and  s*  was  measured.  These  ratios  are  given  in 
Figure  8.5  for  the  least-squares  problems  and  in  Figure  8.6  for  the  barrier  problems. 
In  general,  this  ratio  is  not  as  close  to  one  as  the  curvature  ratios.  However,  a  ratio 
greater  than  0.1  is  obtained  in  86%  of  the  cases  for  the  least-squares  problems  and 
in  93%  of  the  cases  for  the  barrier  problems.  We  believe  that  the  main  reason  for 
this  ratio  not  bung  as  close  to  one  as  the  other  two  ratios,  is  that  ||s/-||  tends  to  zero 
as  the  solution  is  approached,  but  a  nonzero  ||djt|[  will  be  of  order  one.  Therefore, 
because  of  the  way  pk  is  constructed,  dk  will  usually  dominate  s^  so  that  Pk  ~  <4-- 

It  is  noticeable  that  the  barrier  ratios  seem  better  than  the  least-squares  ratios. 
This  is  probably  due  to  the  fact  that  even  though  both  problem  classes  contain 
highly  nonlinear  problems,  the  condition  number  of  Hu  is  generally  smaller  for  the 
barrier  problems. 
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Figure  8.1:  Least-squares  problems:  Ratio  of  the  curvature  along  <4  to  the  smallest 
eigenvalue  of  the  Hessian.  Percentage  out  of  2013  observations. 


Figure  8.2:  Barrier  problems:  Ratio  of  the  curvature  along  <4  to  the  smallest  eigen 
value  of  the  Hessian.  Percentage  out  of  115  observations. 
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Figure  8.3:  Least-squares  problems:  Ratio  of  the  curvature  along  />*  to  the  curvature 
along  dk-  Percentage  out  of  2013  observations. 


Figure  8.4:  Barrier  problems:  Ratio  of  the  curvature  along  pk  to  the  curvature  along 
dk .  Percentage  out  of  115  observations. 


8.  Test  Problems  and  Numerical  Results 
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Figure  8.5:  Least-squares  problems:  Ratio  of  the  cosine  between  p *  and  g ^  to  the 
cosine  between  Sk  and  g^.  Percentage  out  of  2013  observations. 


Figure  8.6:  Barrier  problems:  Ratio  of  the  cosine  between  pk  and  g *  to  the  cosine 
between  s *  and  g *.  Percentage  out  of  115  observations. 
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9.  Discussion 

This  report  describes  a  modified  Newton  method  for  unconstrained  minimization. 
At  each  iteration  a  positive-definite  portion  of  the  Hessian  is  factorized  using  the 
Cholesky  algorithm.  A  descent  direction  is  computed  if  the  gradient  is  nonzero,  and 
a  direction  of  negative  curvature  is  computed  if  the  Hessian  is  sufficiently  indefinite. 
A  linear  combination  of  these  vectors  define  a  search  direction,  along  which  the 
next  iterate  is  found.  Theoretical  properties  of  the  algorithm  are  established,  and 
numerical  data  from  a  set  of  test  problems  are  included. 

As  the  algorithm  is  stated,  if  the  direction  of  negative  curvature  is  nonzero,  it  is 
always  used  to  form  pk ■  From  a  practical  point  of  view  it  is  not  clear  if  this  is  the 
best  use  of  dk .  It  is  possible  to  define  a  number  G  such  that  whenever  \\gk\\  >  G, 
we  may  discard  the  direction  of  negative  curvature  and  still  obtain  the  convergence 
properties  of  Section  7.  This  alternative  algorithm  would  allow  a  scheme  for  control¬ 
ling  that  the  cosine  between  Pk  and  gk  is  not  much  smaller  than  the  cosine  between 
Sk  and  gk,  hereby  ensuring  that  the  search  direction  is  not  significantly  closer  to 
orthc0onality  to  the  negative  gradient  than  the  descent  direction.  The  significance 
of  utilizing  a  direction  of  negative  curvature  was  investigated  by  rerunning  the  test 
problems  with  /?*.  set  to  zero  for  all  k.  On  those  problems  where  a  direction  of  nega¬ 
tive  curvature  previously  had  been  used,  the  number  of  iterations  required  to  satisfy 
the  reduction  of  fk  given  in  Table  8.3  tended  to  increase.  Moreover,  the  number 
of  problems  for  which  the  convergence  criteria  were  not  met  increased  from  six  to 
twelve. 

Finally,  we  note  that  the  convergence  results  given  in  Section  7  imply  convergence 
to  a  point  where  the  gradient  vector  is  zero  and  the  Hessian  matrix  has  a  smallest 
eigenvalue  greater  than  a  small  negative  number.  Since  a  point  satisfying  the  second- 
order  necessary  conditions  has  nonnegative  Hessian  eigenvalues,  it  might  appear  that 
the  convergence  results  are  somewhat  less  satisfactory  than  those  usually  given  for 
methods  of  this  type.  However,  we  observe  that  the  magnitude  of  the  bound  on  the 
smallest  eigenvalue  may  be  made  as  small  as  required  by  assigning  a  suitably  small 
value  for  the  parameter  t.  Small  values  of  c  affect  only  the  numerical  performance 
of  the  method,  and  not  the  theoretical  convergence  properties.  Moreover,  it  may  be 
observed  from  Tables  8.1,  8.2  and  8.4  that  in  most  cases  the  iterates  converged  to  a 
point  where  n\  was  equal  to  n,  that  is  the  Hessian  was  positive  definite.  The  only 
exceptions  are  problems  where  the  Hessian  at  the  solution  is  very  ill-conditioned, 
singular  or  undefined  (for  some  of  the  barrier  problems). 

Our  overall  conclusion  from  the  results  is  that  it  was  possible  to  reduce  the  value 
of  the  objective  function  significantly  in  a  rather  small  number  of  iterations  by  using 
directions  of  negative  curvature  whenever  the  Hessian  was  indefinite.  However,  to 
meet  stringent  convergence  criteria  was  not  always  possible  when  the  Hessian  was 
very  ill-conditioned  or  singular  at  the  solution.  Also,  by  running  the  algorithm 
without  using  the  direction  of  negative  curvature,  we  have  the  impression  that  the 
ability  to  compute  a  direction  of  negative  curvature  is  not  only  a  theoretical  tool 
to  show  convergence,  but  also  a  helpful  device  in  order  to  improve  robustness  and 
efficiency. 
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A.  Appendix:  Properties  of  the  Barrier  Test  Problems 

The  barrier  test  problems  originate  from  a  barrier  function  approach  proposed  by 
Resende  et  al.  [RKR89]  for  solving  0-1  integer  programs.  Given  an  m  x  n  matrix  A 
and  an  n-vector  6,  the  0-1  integer  program  concerns  finding  a  z  in  3f?n  that  belongs 
to  the  set  F\ ,  where 

F\  =  {z  :  Az  <  6,  Zi  =  0  or  z,  =  1  for  i  =  1, . . .  n}. 


We  shall  consider  only  the  case  where  A  and  b  have  integer  coefficients,  since  the 
analysis  is  simpler  in  this  case. 

Applying  the  linear  transformation  x  —  2z  -  e,  the  0-1  problem  is  transformed 
to  an  equivalent  problem  where  all  components  of  the  integer  solution  are  ±1.  The 
transformed  problem  is  converted  into  a  smooth  minimization  problem  by  seeking 
values  in  the  set  F  given  by 


f  A\ 

*  26  -  Ae  -f  e  ^ 

x  : 

-I 

x  < 

e 

\  t) 

\  e  ) 

> 

{x  :  Ax  <  6}  . 


The  aim  is  to  find  a  point  x*  in  F  with  components  ±1  by  minimizing  the  barrier 
function 

i  ,  m+2  n 

/( x)  =  -  ln(n  -  xTx)  -  —  — —  ln(ef(6  -  A*)) 

i=i 

(see  Resende  et  al.  [RKR89]). 

Although  the  barrier  function  is  not  defined  for  a  point  x *  with  components  ±1, 
the  following  lemma  shows  that  the  barrier  function  has  a  global  minimizer  at  x* , 
since  there  exist  sequences  converging  to  x*  for  which  the  function  tends  to  minus 
infinity. 


Lemma  A.l.  Assume  that  the  matrix  A  has  at  least  one  row.  Furthermore,  assume 
that  {xjt}£l0  converges  to  a  point  x*  G  F  such  that  x*  has  all  components  ±1.  If 
Aik  <  6  for  all  k,  and  if  there  exists  a  positive  constant  c  independent  of  k,  such 
that  it  holds  for  all  k  that 


min 


I fSffc  ~  »*)l 

Ik*  -  **|| 


>  c, 


then  limjt-oo  /kfc)  =  -oo. 
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Proof.  For  clarity,  the  iteration  subscript  k  is  dropped  when  subscript  i  is  used  to 
denote  a  particular  component  of  n. 

Using  properties  of  logarithms,  f(x)  may  be  rewritten  as 


/(*)  = 


■  In 


(E?=i(i  -  j-))m/2+n 


m  +  2n  Vn,=i  <{2b  -  Ae  +  e  -  Ax)  n"=.(l  ~  *,2) 


(A.l) 


Since  x*  £  F  with  all  components  ±1,  it  follows  that  26  -  Ae  -  Ax*  >  0  and 
consequently 

lim  ej(2b  -  Ae  +  e  -  Aik )  >  1. 

fc— -*oo 

Therefore,  it  may  be  assumed  without  loss  of  generality  that  Ax <  26  —  Ae  +  e  for 
all  k.  If  r  denotes  the  vector  whose  i-th  component  is  given  by  r,  =  |x,  —  x*|  we  get 


(Er=iQ-*,2))n 


(E"=ir.-(2-r,-))" 

nr=i  r»(2 — ft)  ’ 


where  it  without  loss  of  generality  may  be  assumed  that  r,  <  1  for  all  i.  Dividing 
both  numerator  and  denominator  by  the  positive  quantity  (eTr)n  and  using  the 
existence  of  the  constant  c,  we  derive  the  inequality 


E^('2-r«) 


1=1 


(  2n3,/2  \  ” 


nerr(2  T,'> 


i=i 


If  m  >  0  then  limjt_00(n  —  =  0,  and  it  follows  that  the  argument  of  the  log¬ 

arithm  in  (A.l)  tends  to  zero  as  k  tends  to  infinity.  Consequently,  lim^oo  /(xj.)  = 
-oo  as  required.  | 


The  following  lemma  shows  that  there  is  a  one-to-one  correspondence  between 
points  in  F  with  all  components  ±1  and  points  in  T). 

Lemma  A. 2.  The  set  F\  is  nonempty  if  and  only  if  there  exists  a  point  in  F  with 
all  components  ±1. 

Proof.  Assume  that  ~  €  F\.  A  linear  transformation  x  =  2 z  —  c  yields  x  £  F  with 
all  components  ±1. 

Assume  that  x  £  F  with  all  components  ±1.  Let  x  =  2z  -  e,  and  it  follows  that 
x  is  a  vector  with  all  components  zero  or  one,  for  which  Az  <  b  +  -je.  However,  Az 
and  6  are  integer  vectors,  and  therefore  it  holds  that  Az  <  6.  Consequently,  z  £  F\. 


Lemma  A.l  implies  that  the  barrier  function  has  a  global  minimizer  at  any  point 
x*  in  F  with  all  components  ±1.  From  Lemma  A. 2  it  follows  that  if  such  a  global 
minimizer  is  found,  a  point  in  F\  may  be  identified,  thereby  providing  a  solution  of 
the  original  0-1  integer  program. 
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